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Title: Improvements in or relating to binding proteins for recognition of DMA 



SECTION 1 

ABSTRACT We have used two selection techniques to study sequence- 
specific DNA recognition by the zinc finger, the small, modular DNA- 
binding mini-domain. We have chosen zinc fingers since they bind as 
independent modules, and so can be linked together in a peptide designed to 
bind a predetermined DNA site. In this paper, we describe how a library of 
zinc fingers displayed on the surface of bacteriophage enables selection of 
fingers capable of binding to given DNA triplets. The amino acid sequences 
of selected fingers which bind the same triplet are compared to examine 
how sequence specific DNA recognition occurs. Our results can be 
rationalised In terms of coded interactions between zinc fingers, and DNA, 
involving base contacts from a few a-helical positions. In the following 
paper, we describe a complementary technique which confirms the 
identity of amino acids capable of DNA sequence discrimination from these 
positions- 



The manner in which DNA-binding protein domains are able to discriminate 
between different DNA sequences is an important question in understanding crucial 
processes such as the control of gene expression in differentiation and development. The 
zinc finger motif has been studied extensively, with a view to providing some insight into 
this problem, owing to its remarkable prevalence in the eukaryotic genome, and its 
important role in proteins which control gene expression in Drosophila {eg 1), the 

mouse (2) and humans (3), 

Most sequence-specific DNA-binding proteins bind to the DNA double helix by 
inserting an a-helix into the major groove (4, 5, 6). Sequence specificity results from 
the geometrical and chemical complementarity between the amino acid side chains of the 
a-helix and the accessible groups exposed on the edges of base-pairs. In addition to this 
direct reading of the DNA sequence, interactions, with the DNA backbone stabilise the 
complex and are sensitive to the conformation of the nucleic acid, which in turn depends 
on the base sequence (7). A priori, a simple set of rules might suffice to explain the 
specific association of protein and DNA in all complexes, based on the possibility that 
certain amino acid side chains have preferences for particular base-pairs. However, 
crystal structures of protein-DNA complexes have shown that proteins can be 
idiosyncratic in their mode of DNA recognition, because they use alternative geometries 
to present their sensory a-helices to DNA, allowing a variety of different base contacts to 
be made by a single amino acid and vice versa (8). Nevertheless, for a family of 
transcription factors which use a "probe helix" for binding to the major groove of DNA, it 
would seem possible to deduce some general principles (9). 

We believe the zinc finger of the TFIIIA class to be a good candidate for deriving a 
set of specificity rules owing to its great sinnplicity of structure and interaction with 
DNA. The zinc finger is an independently folding domain which uses a zinc ion to stabilise 
the packing of an antiparallel p-sheet against an a-helix (10, 11, 12). The crystal 
structures of zinc finger-DNA complexes show a semiconserved pattern of interactions in 
which 3 amino acids from the a-helix contact 3 adjacent bases (a triplet) in DNA (13, 
14, 1 5), Thus the mode of DNA recognition is principally a one to one interaction between 
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amino acids and bases. Because zinc fingers function as independent nnodules (10, 16), 
fingers with different triplet specificities are combtned to give specific recognition of 
longer DNA sequences. Protein engineering experiments have shown that it is possible to 
alter rationally the DNA-binding characteristics of individual zinc fingers when one or 
more of the a-helical positions is varied in a number of proteins (17, 18, 19). Because 
a large collection of these mutants is accumulating, it has already been possible to 
propose some rules relating amino acids on the a-helix to corresponding bases in the 
bound DNA sequence (20). However in this approach the altered positions on the a-heiix 
are prejudged, making it possible to overlook the role of positions which are not 
currently considered important; and secondly, owing to the importance of context, 
concomitant alterations are sometimes required to affect specificity (20), so that a 
significant correlation between an amino acid and base may be misconstrued. 

An alternative to the rational but biased design of proteins with new specificities, 
is the isolation of desirable mutants from a large pool. A powerful method of selecting 
such proteins is the cloning of peptides (21). or protein domains (22, 23), as fusions to 
the minor coat protein (pill) of bacteriophage fd, which leads to their expression on the 
tip of the capsid. Phage displaying the peptides of interest can then be affinity purified 
and amplified for use in further rounds of selection and for DNA sequencing of the cloned 
gene. We have applied this technology to the study of zinc finger-DNA interactions; after 
demonstrating that functional zinc finger proteins can be displayed on the surface of fd 
phage, and that the engineered phage can be captured on a solid support coated with 
specific DNA. A phage display library has been created comprising variants of the middle 
finger from the DNA binding domain of Zif268 (a mouse transcription factor containing 3 
zinc fingers) (2), DNA of fixed sequence is used to purify phage from this library over 
several rounds of selection, returning a number of different but related zinc fingers 
which bind the given DNA. By comparing similarities in the amino acid sequences of 
functionally equivalent fingers we deduce the likely mode of interaction of these fingers 
with DNA. Remarkably, it would appear that many base contacts can occur from three 
primary positions on the a-helix of the zinc finger, correlating with the implications of 



the crystal structure of Zif268 bound to DNA (13). The ability to select or design zinc 
fingers with desired specificity means that in the near future. DNA binding proteins 
containing zinc fingers will be made to measure. 



MATERIALS AND METHODS 



Construction and cloning of genes. The gene for the first three fingers (residues 3- 
101) of Transcription Factor IIIA (TFIIIA) was amplified by PCR from the cDNA clone of 
TFIIIA using forward and backward primers which contain restriction sites for Not\ and 
Sfi\ respectively. The gene for the Zif268 fingers (residues 333-420) was assembled 
from 8 overlapping synthetic oligonucleotides, giving SfH and NoC\ overhangs. The genes 
for fingers of the phage library were synthesised from 4 oligonucleotides by directional 
end to end ligation using 3 short complementary linkers, and amplified by PCR from the 
single strand using forward and backward primers which contained sites for Not\ and Sf/i 
respectively. Backward PCR primers in addition introduced Met-Aia-Glu as the first 
three amino acids of the zinc finger peptides, and these were followed by the residues of 
the wild type or library fingers as discussed in the text. Cloning overhangs were produced 
by digestion with SfH and A/od where necessary. Fragments were ligated to lp.g similarly 
prepared Fd-Tet-SN vector. This is a derivative of fd-tet-DOGl (24) in which a section 
of the pelB leader and a restriction site for the enzyme SfH (underlined) have been added 
by site-directed mutagenesis using the oligonucleotide 

5'CTCCTGCAGTTGGACCTGTGCC AT GGCCGGCTGGGC CGCATAGAATGGAACAACTAAAGG3* 
which anneals in the region of the pclylinker (L. Jespers, persona! communicr^tion). 
Electrocompetent DH5a cells were transformed with recombinant vector in 200ng 
aliquots, grown for 1 hour in 2xTY medium with 1% glucose, and plated on TYE 
containing 1 Sjig/ml tetracycline and 1 % glucose. 

Phage selection. Colonies were transferred from plates to 200ml 2xTY/Zn/Tet 
(2xTY containing SOtiM Zn(CH3.COO)2 and 1 S^ig/ml tetracycline) and grown overnight. 
Phage were purified from the culture supernatant by two rounds of precipitation using 
0.2 volumes of 20% PEG/2. 5M NaCI containing 50\iM Zn(CH3.COO)2. and resuspended in 
zinc finger phage buffer (20mM HEPES pH7.5. 50mM NaCI, 1 mM MgClz and SO^iM 
Zn(CH3.COO)2). Streptavidin-coated paramagnetic beads (Dynal) were washed in zinc 
finger phage buffer and blocked for 1 hour at room temperature with the same buffer 



made up to 6% in fat-free dried milk (Marvel). Selection of phage was over three 
rounds: in the first round, beads (Img) were saturated with biotinylated oligonucleotide 
(-80nM) and then washed prior to phage binding, but in the second and third rounds 
1.7nM oligonucleotide and 5ug poly dGC (Sigma) were added to the beads with the phage. 
Binding reactions (1.5ml) for 1 hour at 1 5^C were in zinc finger phage buffer made up 
to 2% in fat-free dried milk (Marvel) and 1% in Tween 20, and typically contained 
5x10^ ^ phage. Beads were washed 15 times with 1ml of the same buffer. Phage were 
eluted by shaking in O.IM triethylamine for 5min and neutralised with an equal volume 
of IM Tris pH7.4. Log phase E, co// TGI in 2x7^ were infected with eluted phage for 
30min at 37^0 and plated as described above. Phage yields were titred by plating serial 
dilutions of the infected bacteria. 

Sequencing of selected phage. Single colonies of transformants obtained after 
three rounds of selection as described, were grown overnight in 2xTY/Zn/Tet. Small 
aliquots of the cultures were stored in 1 5% glycerol at -20°C, to be used as an archive. 
Single-stranded DNA was prepared from phage in the culture supernatant and sequenced 
using Sequenase 2.0 (U.S. Biochemical Corp.). 



RESULTS AND DISCUSSION 



Phage display of 3-finger DNA-Binding Domains from TFIIIA or Zif268. 
Prior to the construction of a phage display library, we demonstrated that peptides 
containing three fully functional zinc fingers could be displayed on the surface of viable 
fd phage when cloned in the vector Fd-Tet-SN. In preliminary experiments, we cloned as 
fusions to pill firstly the three N-terminal fingers from TFIIIA (Z5), and secondly the 
three fingers from Zif268 (2), for both of which the DNA binding sites are known. 
Peptide fused to the minor coat protein was detected in Western blots using an anti-plll 
antibody (26). Approximately 10-20% of total pitl in phage preparations was present 
as fusion protein. 

Phage displaying either set of fingers were capable of binding to specific DNA 
oligonucleotides, indicating that zinc fingers were expressed and correctly folded in both 
instances. Paramagnetic beads coated with specific oligonucleotide were used as a medium 
on which to capture DNA-binding phage (Fig.lA & C), and were consistently able to 
return between 100 and 500-fold more such phage, compared to free beads or beads 
coated with non-specific DNA. Alternatively, when phage displaying the three fingers of 
Zif268 were diluted l:1.7xl03 with Fd-Tet-SN phage not bearing zinc fingers, and the 
mixture incubated with beads coated with Zif268 operator DNA, one in three of the total 
phage eluted and transfected into £. col/ were shown by colony hybridisation to carry the 
Zif268 gene, indicating an enrichment factor of over 500 for the zinc finger phage. Hence 
it is clear that zinc fingers displayed on fd phage are capable of preferential binding to 
DNA sequences with which they can form specific complexes, making possible the 
enrichment of wanted phage by factors of up to 500 in a single affinity purification step. 
Therefore, over multiple rounds of selection and amplification, very rare clones capable 
of sequence-specific DNA binding can be selected from a large library. 

A phage display library of zinc fingers from Zif268. We have made a 
phage display library of the three fingers of Zif268 in which selected residues in the 
middle finger are randomised (Fig. IB), and have isolated phage bearing zinc fingers with 



8 



desired specificity using a modified Zif268 operator sequence (27) in which the middle 
DNA triplet is altered to the sequence of interest (Fig. IC). In order to be able to study 
both the primary and secondary putative base recognition positions which are suggested 
by database analysis (28)» we have designed the library of the middle finger so that, 
relative to the first residue in the a-helix (position +1), positions -1 to +8, but 
excluding the conserved Leu and His, can be any amino acid except Phe, Tyr, Trp and Cys 
which occur rarely at those positions (29). In addition, we have allowed position +9 
(which might make an inter-finger contact with Ser at position -2 (13)) to be either 
Arg or Lys, the two most frequently occurring residues at that position. 

The logic of this protocol, based upon the Zif268 crystal structure (13), is that 
the randomised finger is directed to the central triplet since the overall register of 
protein-DNA contacts is fixed by its two neighbours. This enables us to examine which 
amino acids in the randomised finger are the most important in forming specific 
complexes with DNA of known sequence. Since comprehensive variations are programmed 
in all the putative contact positions of the a-helix, we are able to conduct an objective 
study of the importance of each position in DNA-binding (28). 

The size of the phage display library required assuming full degeneracy of the 8 
variable positions is (16'^ x 2^=) 5.4 x lO^, but because of practical limitations in the 
efficiency of transforr^ation with Fd-Tet-SN. we have been 9b!e to clone only 2.6x1 06 of 
these. The library we use is therefore some two hundred times smaller than the 
theoretical size necessary to cover all the possible variations of the a-helix. Despite this 
shortfall, it has been possible to isolate phage which bind with high affinity and 
specificity to given DNA sequences, demonstrating the remarkable versatility of the zinc 
finger motif. 

Amino acid-base contacts in zinc finger-DNA complexes deduced 
from phage display selection. Of the 64 base triplets that could possibly form the 
binding site for variations of finger 2, we have so far used 32 in attempts to isolate zinc 
finger phage as described. Results from these selections are shown in Table 1 . In general 
we have been unable to select zinc fingers which bind specifically to triplets without a 5* 



or 3* guanine, all of which return the same limited set of phage after three rounds of 
selection (see legend to Table 1 ). However for each of the other triplets used to screen the 
library, a family of zinc finger phage is recovered. In these families, we find a sequence 
bias in the randomised a-helix. which we interpret as revealing the position and identity 
of amino acids used to contact the DNA. For instance: the middle fingers from the 8 
different clones selected with the triplet GAT (Table 1 d) all have Asn at position +3 and 
Arg at position +6, just as does the first zinc finger of the Drosophila protein tramtrack 
in which they are seen making contacts to the same triplet in the cocrystal with specific 
DNA (14). This indicates that the positional recurrence of a particular amino acid in 
functionally equivalent fingers is unlikely to be coincidental, but rather because it has a 
functional role. Thus using data collected from the phage display library (Table 1) it is 
possible to infer most of the specific amino acid-DNA interactions. Remarkably, most of 
the results can be rationalised in terms of contacts from the three primary a-helical 
positions (-1, +3 and +6) identified by X-ray crystallography (13) and database 
analysis (28). 

As has t>een pointed out before (30), guanine has a particularly important role in 
zinc finger-DNA interactions. When present at the 5* (e.g. Table! c-/) or 3' {e.g. 
Table! m-d) end of a triplet, G selects fingers with Arg at position +6 or -! of the a- 
he'iy respectively. When present in the middle position of a triolet fe.jg. Table! t»), G 
prefers His at position +3. Occasionally, G at the 5' end of a triplet selects Ser or Thr at 
+6 (e.g. Table! p). Since G can only be specified absolutely by Arg (3!), this is the most 
common determinant at -1 and +6. We can expect this type of contact to be a bidentate 
hydrogen bonding interaction as seen in the crystal structures of Zif268 (13) and 
tramtrack (!4). In these structures, and in almost all of the selected fingers in which 
Arg recognises G at the 3' end. Asp occurs at position +2 tp buttress the long Arg side 
chain {e.g. Table! o,p). When position -1 is not Arg, Asp rarely occurs at +2." suggesting 
that in this case any other contacts it might make with the second DNA strand do not 
contribute significantly to the stability the protein-DNA complex. 
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Adenine is also an important determinant of sequence specificity, recognised 
almost exclusively by Asn or Gin which again are able to make bidentate contacts (31). 
When A is present at the 3* end of a triplet. Gin is often selected at position -1 of the a- 
helix. accompanied by small aliphatic residues at ♦Z (e.g. Tablelfa). Adenine in the 
middle of the triplet strongly selects Asn at *3 (e.g. Table Ic-e), except in tiie triplet 
GAG (Table! a) which selected only two types of fmgcr, both with His at +3 (one being the 
wild-type Zif268 which contaminated the library during this experiment). The triplets 
ACG (Table! and ATG (Table! k), which have A at the 5* end. also returned oligoclonal 
mixtures of phage, the majority of which were of one clone with Asn at 4-6. 

In theory, cytosine and thymine cannot be reliably discriminated by a hydrogen 
bonding amino acid side chain in the major groove (31). Nevertheless, C in the 3* 
position of a triplet shows a marked preference for Asp or GIu at position -1. together 
with Arg at +! (e.g. Table! e-g). Asp is also sometimes selected at +3 and +6 when C is 
in the middle (e.g. Table! o) and 5* (e.g. Table! a) position respectively. Although Asp can 
accept a hydrogen bond from the amino group of C, we should note that the positive 
molecular charge of C in the major groove (32) will favour an interaction with Asp 
regardless of hydrogen bonding contacts. However, C in the middle position most 
frequently selects Thr (e.g. Table!/), Val or Leu (e.g. Table! o) at +3. Similarly, T in the 
middle position most often selects Ser (e.g. Table! /), Ala or Val (e.g. Table! p) at +3. The 
aliphatic amino acids are unable to make hydrogen bonds but Ala probably has a 
hydrophobic interaction with the methyl group of T, whereas a longer side chain such as 
Leu can exclude T and pack against the ring of C. When T is at the 5* end of a triplet, Ser 
and Thr are selected at +6 (as is occasionally the case for G at the 5* end). Thymine at the 
3* end of a triplet selects a variety of polar amino acids at -1 (e.g. TabielcO, and 
occasionally returns fingers with Ser at -i-2 (e.g. Table! d) which could make a contact as 
seen in the tramrrac/c^ crystal structure (14). 

Limitations of phage display. From Table 1 it can be seen that a consensus or 
bias usually occurs in two of the three primary positions (-!, +3 and +5) for any 
family of equivalent fingers, suggesting that in many cases phage selection is by virtue of 
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only two base contacts per finger as is observed in the Zif268 crystal structure (13). 
Accordingly, identical finger sequences are often returned by DNA sequences differing by 
one base in the central triplet. One reason for this is that the phage display selection, 
being essentially purification by affinity, can yield zinc fingers which bind equally 
tightly to a number of DNA triplets and so are unable to discriminate. Secondly, since 
complex formation is governed by the law of mass action, affinity selection can favour 
those clones whose representation in the library is greatest even though their true 
affinity for DNA is less than that of other clones less abundant in the library. Phage 
display selection by affinity is therefore of limited value in distinguishing between 
permissive and specific interactions beyond those base contacts necessary to stabilise the 
complex. Thus in the absence of competition from fingers which are able to bind 
specifically to a given DNA. the tightest non-specific complexes will be selected from the 
phage library. Consequently, results obtained by phage display selection from a library 
must be confirmed by specificity assays, particularly when that library is of limited 
size. 

Conclusion. The amino acid sequence biases observed within a family of 
functionally equivalent zinc fingers indicate that, of those a-helical positions randomised 
in this study, only three primary (-1, +3 and +6) and one auxiliary (+Z) positions are 
i.nvclved in the recognition of DNA. Moreover a Irmired set of amino acids are to be found 
at those positions, and we presume that these make contacts to bases. The indications 
therefore are that a code can be derived to describe zinc finger-DNA interactions. At this 
stage however, although sequence homologies are strongly suggestive of amino acid 
preferences for particular base-pairs, we cannot confidently deduce such rules until the 
specificity of individual fingers for DNA triplets is confirmed. We therefore defer 
making a summary table of these preferences until the following paper (33) in which we 
describe how randomised DNA binding sites can be used to this end. 

While this work was in progress, a paper appeared by Rebar and Pabo (34) in 
which phage display was also used to select zinc fingers with new DNA-bindrng 
specificities. These authors constructed a library in which the first finger of Zif268 is 
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randomised, and screened with tetranucleotides to take into account end effects such as 
additional contacts fronn variants of this finger. Only 4 positions (-1. +2, +3 and +6) 
were randomised, chosen on the basis of the earlier X-ray crystal structures. The results 
of our work in which more positions were randomised, to some extent justifies Rebar and 
Pabo^s use of the four random positions without apparent loss of effect, although further 
selections may reveal that the library is compromised. However, randomising only four 
positions decreases the theoretical library size so that full degeneracy can be achieved in 
practice. Nevertheless we find that the results obtained by Rebar and Pabo by screening 
their complete library with two variant Zif268 operators, are in agreement with our 
conclusions derived from an incomplete library. On the one hand this again highlights the 
versatility of zinc fingers but. remarkably, so far both studies have been unable to 
produce fingers which bind to the sequence CCT. It will be interesting to see whether 
sequence biases such as we have detected would be revealed, if more selections were 
performed using Rebar and Pabo's library. In any case, it would be desirable to 
investigate the effects on selections of using different numbers of randomised positions in 
more complete libraries than we have used at present. 
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FIGURE LEGENDS FOR SECTION 1 

FIG. 1 Affinity purification of zinc finger phage. 

(A) Zinc fingers [A] are expressed on the surface of fd phage [B] as fusions to the 
minor coat protein [C]. Zinc finger phage are bound to S'-biotlnylated DNA 
oligonucleotide [D] attached to streptavidin-coatec paramagnetic beads [E]. and captured 
using a magnet [F]. (Figure adapted from Dynal AS and also Marks er al, (35)) 

(B) Protein sequence of the three zinc fingers from Z)f268 used in the phage 
display library. The randomised positions in the a-helix of the second finger have 
residues marked 'X'. The amino acid positions are numbered relative to the first helical 
residue (position 1). For amino acids at positions -1 to +8, excluding the conserved Leu 
and His, codons are equal mixtures of (G.A.C)NN - T in first base position is omitted in 
order to avoid stop codons, but this has the effect that the codons for Trp. Phe, Tyr and 
Cys are not represented. Position +9 is specified by the codon A(G.A)G. allowing either 
Arg or Lys. Residues of the hydrophobic core are circled, whereas the zinc ligands are 
written as white letters on black circles. The positions forming the p-sheets and the a- 
helix of a zinc finger are marked below the sequence. 

(C) Sequences of DNA oligonucleotides used to purify (i) phage displaying the 
first three fingers of TFIIIA. (ii) phage displaying the three fingers of Zif268, and (iii) 
zinc fir.ger phage from the phage display library. The Zif268 consensus operator 
sequence used in the X-ray crystal structure (13) is highlighted in (ii), and in (iii) 
where 'X' denotes a base change from the ideal operator in oligonucleotides used to purify 
phage with new specificities. Biotinylation of one strand is shown by a circled '8*. 

Table 1. Amino acid sequences of the variant a-helical regions from clones of library 
phage selected after 3 rounds using variants of the Zif268 operator. The amino acid 
sequences, aligned in the one letter code, are listed alongside the DNA oligonucleotides 
used in their purification (a-p). The latter are denoted by the sequence of the central 
DNA triplet in the 'bound' strand of the variant Zif268 operator. The amino acid positions 
are numbered relative to the first helical residue (position 1), and the three primary 
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recognition positions are highlighted. The accompanying numbers indicate the independent 
occurrences of that clone in the sequenced population (5-10 colonies); where numbers 
are in parentheses, the cione{s) were detected in the penultimate round of selection but 
not in the final round. In addition to the DNA triplets shown here, others were also used in 
attempts to select zinc finger phage from the library, but most selected two clones, one 
having the a-helical sequence KASNLVSHIR, and the other having LRHNLETHMR. Those 
triplets were: ACT. AAA, TTT, CCT. CTT. TTC, AGT. CGA, CAT, AGA, AGC and AAT. 
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SECTION 2 

ABSTRACT In the preceding section we showed how selections from a 
library of zinc fingers displayed on phage yielded fingers able to bind to a 
number of DNA triplets. Here, we describe a new technique to deal 
efficiently with the converse problem, namely the selection of a DNA 
binding site for a given zinc finger. This is done by screening against 
libraries of DNA triplet binding sites randomised in two positions but 
having one base fixed in the third position. The technique is applied here 
to determine the specificity of fingers previously selected by phage 
display. We find that some of these fingers are able to specify a unique 
base in each position of the cognate triplet. This is further illustrated by 
examples of fingers which can discriminate between closely related 
triplets as measured by their respective equilibrium dissociation 
constants. Comparing the amino acid sequences of fingers which specify a 
particular base in a triplet, we infer that in most instances, sequence 
specific binding of zinc fingers to DNA can be achieved using a small set of 
amino acid-base contacts amenable to a code. 
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In principle, rules governing protein-ONA interactions can be deduced from ?\ 
large database of correlations between the amino acid sequences of the proteins and the 
nucleotide sequences of their optimal binding sites. To this end, we have shown in the 
preceding paper (1) that functionally equivalent zinc fingers which bind to a given DNA 
sequence can be selected from a phage display library. However, determination of the 
optimal binding site for these fingers is still required, as a safeguard against spurious 
selections. One can determine the optimal binding sites of these (and other) proteins, by 
selection from libraries of randomised DNA. This approach, the principle of which is 
essentially the converse of zinc finger phage display, would provide an equally 
informative database from which the same rules can be independently deduced. However, 
until now the favored method for binding site determination, involving iterative selection 
and amplification of target DNA followed by sequencing, is a laborious process not 
conveniently applicable to the analysis of a large database (2, 3). 

We present here a convenient and rapid new method which can reveal the optimal 
binding site(s) of a DNA binding protein by single step selection from small libraries, 
and use this to check the binding site preferences of those zinc fingers selected previously 
by phage display (1). For this application, we use 12 different mini-libraries of the 
Zif268 binding site, each one with the central triplet having one position defined with a 
particular base pair and the ether two positions rando.mised. Each library therefore 
comprises 1 6 oligonucleotides and offers a number of potential binding sites to the middle 
finger, provided that the latter can tolerate the defined base pair. Each zinc finger phage 
is screened against all 1 2 libraries individually immobilised in wells of a microtitre 
plate, and binding is detected by an enzyme immunoassay. Thus a pattern of acceptable 
bases at each position is disclosed, which we call a 'binding site signature'. The 
information contained in a bindihg site signature encompasses the repertoire of binding 
sites recognised by a zinc finger. 

The binding site signatures obtained, using zinc finger phage selected as described 
in the precedirig pi^per (1), reveal that the selection has yielded some higfily secjuerice- 
specific zinc fingers which discriminate at all three positions of a triplet. From 
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measurements of equilibrium dissociatiofi constants we find that these fingers bind 
tightly to the triplets indicated in tfieir signatures, and discriminate against closely 
related sites usually by at least a factor of ten. The binding site signatures allow us to 
infer rules towards a specificity code for the interactions of zinc fingers with DNA. 
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MATERIALS AND METHODS 



Binding site signatures. Flexible fiat-bottomed 96-well microtitre plates 
(Falcon) were coated overnight at A^C with streptavidin (O.lmg/ml in 0. 1 M NaHC03 
pH8-6, 0-03% NaN3). Wells were blocked for one hour with PBS/Zn (PBS, SOpM 
2n(CH3.COO)2) containing 2% fat-free dried milk (Marvel), washed 3 times with 
PBS/Zn containing 0.1% Tween. and another 3 times with PBS/Zn. The 'bound' strand of 
each oligonucleotide library was made synthetically and the other strand extended from a 
5'-biotinyIated universal primer using DNA polymerase I (Klenow fragment). Fill-in 
reactions were added to wells (0.8 pmole DNA library in each) in PBS/Zn for 15 
minutes, then washed once with PBS/Zn containing 0.1% Tween, and once again with 
PBS/Zn. Overnight bacterial cultures each containing a selected zinc finger phage (1) 
were grown in 2xTY containing SOmM Zn(CH3.COO)2 and 1 Spg/ml tetracycline at 30*^C. 
Culture supernatants containing phage were diluted tenfold by the addition of PBS/Zn 
containing 2% fat-free dried miik (Marvel), 1% Tween and 20 ^g/ml sonicated salmon 
sperm DNA. Diluted phage solutions (SOpI) were applied to wells and binding allowed to 
proceed for one hour at 20*^C. Unbound phage were removed by washing 5 times with 
PBS/Zn containing 1% Tween, and then 3 times with PBS/Zn. Bound phage were detected 
described- (4), or using HRP-conjugated anri-M13 igG (Pharmacia), and quantitated 
using SOFTmax 2.32 (Molecular Devices Corp.), 

Determination of apparent equilibrium dissociation constants. 

t 

Overnight bacterial cultures were grown in 2xTY/Zn/Tet at 30°C, Culture supernatants 
containing phage were diluted twofold by the addition of PBS/Zn containing 4% fat-free 
dried miik (Marvel). 2% Tween and 40 \.ig/m\ sonicated salmon sperm DNA. Binding 
reactions, containing appropriate concentrations of specific 5'-biotinylated DNA and 
equal volumes of zinc finger phage solution, were allowed to equilibrate for Ih at 20°C. 
All DNA was captured on streptavidin-coated paramagnetic beads (SOOpg per well), 
which were subsequently washed 6 times with PBS/Zn containing 1% Tween. and tl^en 3 
times with PBS/Zn. Bound plinge were detc-cted using HRP-conjugated anti-M13 IgG 
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(Pharmacia) and developed as described (-4). Optical densities were quantitaied usincj 

SOFTmax 2.32 (Molecular Devices Corp.). 

Estimations of the Kd are by fitting to the equation Kd« (DNA].[P]/[DNA.P], using 

the programme KaleidaGraph Version 2.0 (Abelbeck Software). Owing to the sensitivity 
of the ELISA used to detect protein-DNA corr.piex. we can use zinc finger phage 
concentrations far below those of the DNA. as is required for accurate calculations of the 
Kd- The technique we use has the advantage that wniie the concentration of DNA (variable) 
must be known accurately, that of the zmc fingers (constant) need not be known (5). 
This circumvents the problem of calculating the number of zinc fmger peptides expressed 
on the tip of each phage, although smce only 10-20% of the gene III protein (pill) 
carries such peptides we would expect on average less than one copy per phage. Binding is 
performed in solution to prevent any effects caused by the avidity (6) of phage for DNA 
immobilised on a surface. Moreover, in this case measurements of by ELISA are made 
possible since equilibrium is reached in solution prior to capture on the solid phase. 



23 



RESULTS AND DISCUSSION 



The binding site signature of the second finger of Zif 2 68. The top row 
of Fig. 2 shows the signature of the second finger of wild type Zif258. From the pattern of 
strong signals indicating binding to oligonucleotide libraries having GNN, TNN, NGN and 
NNG as the middle triplet, it emerges that the optimal binding site for this finger is 
*^/gGG. in accord with the published consensus sequence (7). This has implications for 
the interpretation of ihe X-ray crystal structure of Zif268 solved in complex with a 
consensus operator having TGG as the middle triplet (8). For instance. His at position +3 
of the middle finger was modelled as donating a hydrogen bond to N7 of G, suggesting an 
equivalent contact to be possible with N7 of A, but from the binding site signature we can 
see that there is discrimination against A. This implies that the His may prefer to make a 
hydrogen bond to 05 of G or a bifurcated hydrogen bond to both 06 and N7, or that a steric 
clash with the amino group of A may prevent a tight interaction with this base. Thus by 
considering the stereochemistry of double helical DNA, binding site signatures can give 
insight into the details of zinc finger-DNA interactions. 

Amino acid-base contacts in zinc finger-DNA complexes deduced 
from binding site signatures. The binding site signatures of other zinc fingers 
(Fig,2 ) reveal that the phrge selections we performed in our previous study (1) have 
yielded highly sequence specific DNA binding proteins. Some of these are able to specify a 
unique sequence for the middle triplet of a variant Zif268 binding site, and are therefore 
more specific than is Zif268 itself for its consensus site. Moreover, one can identify the 
fingers which recognise a particular oligonucleotide library, that is to say a specific base 
at a defined position, by looking down the columns of Fig. 2. By comparing the amino acid 
sequences of these fingers we can identify any residues which have genuine preferences 
for particular bases on bound DNA. With a few exceptions, these are as previously 
predicted on the basis of phage display ( 1 ). and are summarised in Table 2^ 

The binding site signntures also reveal an important feature of our phage display 
library whicfi is criicinl to the inter fxetation of our selection results. All the fingers in 
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our panel, regardless of the amino acid present at position +6. are able to recognise G or 
both G and T at the 5' end of a triplet. Our explanation for this is that the 5' position of the 
middle triplet is fixed as either G or T by a contact from the invariant Asp at position +2 
of finger 3 to the partner of either base on the complementary strand, analogous to those 
seen in the Zif268 (8) and zramzrack (9) crystal structures (a contact to tlie NH^ of C 
or A respectively in the major groove). Therefore Asp at position +2 of finger 3 is 
dominant over the amino acid present at position -r6 of the middle finger, precluding the 
possibility of recognition of A or C at the 5' position. Future libraries must be designed 
with this interaction omitted or the position varied. Interestingly, given the framework 
of the conserved regions of the three fingers, we can identify a rule in the second finger 
which specifies a frequent interaction with both G and T, viz. the occurrence of Ser or 
Thr at position +6, which may donate a hydrogen bond to either base. 

Modulation of base recognition by auxiliary positions. As we have noted 
above, position +2 is able to specify the base directly 3' of the 'cognate triplet*, and can 
thus work in conjunction with position +6 of the preceding finger. The binding site 
signatures, whilst pointing to amino acid-base contacts from the three primary 
positions, indicate that auxiliary positions can play other parts in base recognition. A 
clear case in. point is Gin at position -1. which is specific for A at the 3' end of a triplet 
when position +2 is a small non polar amino acid such as Ala. though specific for T when 
polar residues such as Ser are at position +2. The strong correlation between Arg at 
position -1 and Asp at position +2, the basis of which is understood from the X-ray 
crystal structures of zinc fingers (8. 9), is another instance of interplay between these 
two positions. Thus the amino acid at position +2 is able to modulate or enhance the 
specificity of the amino acid at other positions. 

At position +3. a tjifferent type of modulation is seen in the case of Thr and Val. 
which most often prefer C in the middle position of a triplet, but in some zinc fingers arc- 
able to recognise both C iuid T. This iimbiguity occurs possibly as a result of dtfteicni 
hydrophobic interactions involving the methyl groups of these residues. ar>d l^eie :* 
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flexibility in the inclination of the finger rather than an effect from another position pei 
se nnay be the cause of ambiguous reading. 

Quantitative measurements of dissociation constants. The binding site 

signature of a zinc finger reveals its differential base preferences at a given 
concentration of DNA. As the concentration of DNA is altered, one can expect the biiiding 
site signature of any clone to change, being more distinctive at low [DNA], and becoming 
less so at higher [DNA] as the Kd of less favourable sites is approached and further bases 

become acceptable at each position of the triplet. Furthermore, because two base 
positions are randomly occupied in any one library of oligonucleotides, binding site 
signatures are not formally able to exclude the possibility of context dependence for some 
interactions. Therefore to supplement binding site signatures, which are essentially 
comparative, quantitative determinations of the equilibrium dissociation constant of each 
phage for different DNA binding sites are required. After phage display selection and 
binding site signatures, these are the third and definitive stage in assessing the 
specificity of zinc fingers. 

Examples of such studies presented in Fig. 3 reveal that zinc finger phages bind 
the operators indicated in their binding site signatures with KdS in the range of 10'^- 
10"^ M, and can discriminate against closely related binding sites by factors greater than 
an order of magnitude. Indeed, Fig. 3 shows such differences in affinity for binding sites 
which differ in only one out of nine base pairs. Since the zinc fingers in our panel were 
selected from a library by non-competitive affinity purification, there is the possibility 
that fingers which are even more discriminatory can be isolated using a competitive 
selection process. 

Measurements of dissociation constants allow different triplets to be ranked in 
order of preference according to the strength of binding. The examples here indicate thai 
the contacts from' either position -1 or +3 can contribute to discrimination. Also, the 
ambiguity in certain binding site signatures referred to above can be shown to have <\ 
basis in the equal affifMty of certain fingers for closely related triplets. This is 
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demonstrated by the K<js of the finger containing the amino acid sequence RGDALTSHER for 
the triplets TTG and GTG. 

A code for zinc finger-DNA recognition. One would expect that the 
versatility of the zinc finger motif will have allowed evolution to develop various modes 
of binding to DMA (and even to RNA), which will be too diverse to fall under the scope of a 
single code. However, although a code may not apply to ail zinc finger-DNA interactions, 
there is now convincing evidence that a code applies to a substantial subset. This code will 
fall short of being able to predict unfailingly the DNA binding site preference of any given 
zinc finger from its amino acid sequence, but may yet be sufficiently comprehensive to 
allow the design of zinc fingers with specificity for a given DNA sequence. 

Using the selection methods of phage display (1 ) and of binding site signatures, we 
find that in the case of Zif268-like zinc fingers. DNA recognition involves four fixed 
principal (three primary and one auxiliary) positions on the cx-helix, from where a 
limited and specific set of amino acid-base contacts result in recognition of a variety of 
DNA triplets. In other words, a code can describe the interactions of zinc fingers with 
DNA. Towards this code, we can propose amino acid-base contacts for almost all the 
entries in a matrix relating each base to each position of a triplet (Table 2 )- Where there 
is overlap, our results complement those of Desjarlais and Berg who have derived 
similar rules by altering zinc finger specificity using database-guided mutagenesis (10, 

11). 

Combinatorial use of the coded contacts. The individual base contacts listed 

in Table 2, though part of a code, may not always result in sequence specific binding to 

the expected base triplet when used in any combination. In the first instance we must be 

aware of the possibility that zinc fingers may not be able to recognise certain 

combinations of bases in some triplets by use of this code, or even at all. Otherwise, the 

majority of inconsistencies may be accounted for by considering variations in the 

inclination of the trident reading head of a zinc finger with respect to the triplet with 

which it is interacting. It zippears tliat the identity of an amino acid at any one t/-helical 
* 

position is attuned to the identity of the residues at^the other two positions to allow three 
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base contacts to occur suiujItaneoLisly. Tliefefore, for example, m order that Ala may pick 
out T in the triplet GTG. Arg nujst not be usea to recognise G from position +6, since this 
would distance the former too far from the DNA (see *or exampie the finger containing the 
amino acid sequence RGDALTSHER). Secondly, since :r.e pitch ot the f/.-helix is 3.6 amino 
acids per turn, positions -1, +3 and -^6 are not an tntegral number of turns apart, so 
that position +3 is nearer to the DNA than are -1 ot Hence, tor example, short amino 
acids such as His and Asn, rather than the longer Arg and Gin. are used for the recognition 
of purines in the middle position of a triplet. 

As a consequence of these distance effects we might say tnat the code is not really 
'alphabetic' (always identical amino acid:Dase contact) but rather *syllabic' (use of a 
small repertoire of amino acid:base contacts). An alonaDetic code would involve only four 
rules, but syllabicity adds an additional level of complexity, since systematic 
combinations of rules comprise the code. Nevertheless, the recognition of each triplet is 
still best described by ;i code of syllables, rather than a catalogue of 'logograms' 
(idiosyncratic amino acidrbase contact depending on triplet). 

Conclusions. The 'syllabic' code of interactions with DNA is made possible by the 
versatile framework of the zinc finger: this allows an adaptability at the interface with 
DNA by slight changes of orientation, which in turn maintains a stoichiometry of one 
ccplanar amine acid per base-pair in .many different complexes. Given this mode of 
interaction between amino acids and bases it is to be expected that recognition of G and A 
by Arg and Asn/GIn respectively are important features pf the code; but remarkably, 
other interactions can be more discriminatory than was anticipated (12). Conversely, it 
IS clear that degeneracy can be programmed in the zinc fingers in varying degrees, 
allowing for intricate interactions with differem regulatory DNA sequences (13. 7). One 
can see how this principle makes possible the regulation of differential gene expression 
by a limited set of transcription factors. 

As we have noted, tf'ie versatility of the finger motif will likely allow other modes 
of binding to DNA. Similarly, we must take into account the malleability of nucleic acids, 
such as is observed in Fairall er di (9) where a deformation of the double helix nt n 
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flexible base step allows a direct comact from Ser at position +2 of finger! to a T at the 
3* position of the cognate triplet. Even in our selections there are instances of fingers 
whose binding mode is obscure, and may require structural analyses for clarification. 
Thus, water may be seen to play an important role, for example where short side chains 
such as Asp, Asn or Ser interact with bases from position -1 (14, 15). 

Eventually, it might be possible to develop a number of codes describing zinc 
finger binding to DNA, which could predict the binding site preferences of some zinc 
fingers from their amino ncid sequence. The functional amino acids selected at positions 
-1, +3 and to an extent +6 in this study, are very frequently observed at the same 
positions in naturally occurring fingers (e.g. see Fig. 4. in Desjarlais and Berg (16)). 
supporting the existence of coded contacts from these three positions. However, the lack 
of definitive predictive methods is not a serious practical limitation as current 
laboratory techniques (here and in (2, 3)) will allow the identification of binding sites 
for a given DNA-binding protein. Rather, we can apply phage selection and a knowledge of 
the recognition rules to the converse problem, namely the design of proteins to bind 
predetermined DNA sites. 

Prospects for the design of DNA-binding proteins. The ability to 
manipulate the sequence specificity of zinc fingers implies that we are on the eve of 
designing DNA-binding proteins with desired specificity for applications in medicine and 
research (11, 17). This is possible because, by contrast to all other DNA-binding 
motifs, we can avail ourselves of the modular nature of the zinc finger, since DNA sites 
can be recognised by appropriate combinations of independently acting fingers linked in 
tandem. 

The coded interactions of zinc fingers with DNA can be used to model the 
specificity of individual zinc firigers de novo , or more likely in conjunction with phage 
display selection of suitable candidates. In this way. according to requirements, one cotikl 
modulate the affinity for a given binding site, or even engineer an appropriate degree of 
indiscrimination at particular base positions. Moreover, the additive effect of multiply 
repeated domains offers the opportunity to bind specifically and tightly to extended, and 
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hence very rare, genomic loci. Thus zinc fincjer proteins miglic well be a good alterr^ativo 
to the use of antisense nucleic acids in supf^ressing or modifying the action of a given 
gene, whether norma! or mutant. To this end. extra functions could be introduced to these 
DNA binding domains by appending suitable natural or synthetic effectors. 
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FIGURE LEGENDS FOR SECTION 2 

FIG 2 Binding site signatures of individual zinc finger phage. The diagram is of raw data 
and represents binding of zinc finger phage to randomised DNA immobilised in the wells 
of microtitre plates. To test each zinc finger phage against each oligonucleotide library 
(see text). DNA libraries ;3re applied to columns of welts (down the plate), while rows of 
wells (across the plate) contain equal volumes of a solution of a zinc finger phage. The 
identity of each library is given as the middle triplet of the 'bound* strand of Zif268 
operator, where N represents a mixture of all 4 nucleotides. The zinc finger phage is 
specified by the sequence of the variable region of the middle finger, numbered relative to 
the first helical residue (position 1), and the three primary recognition positions are 
highlighted. Bound phage are detected by an enzyme immunoassay. The approximate 
strength of binding is indicated by a grey scale proportional to the enzyme activity. From 
the pattern of binding to DNA libraries, called the 'signature' of each clone, one or a small 
number of binding sites can be read off and these are written on the right of the figure. 

FIG 3,. Determination of apparent equilibrium dissociation constants of zinc finger phage 
for variants of the Zif268 binding site, showing discrimination of closely related triplets 
by the middle finger, usually by factors of >10. The two outer fingers carry the native 
sequence, as do the two cognate outer ONA triplets. The sequence of amino acids occupying 
helical positions -1 to +9 of the varied middle finger is shown in each case. 

Table2 . Summary of frequently observed amino acid-base contacts in interactions of 
selected zinc fingers with DNA. The given contacts comprise a syllabic recognition code 
(see text) for appropriate triplets. Cognate amino acids and their positions in the </- helix 
are entered in a matrix relating each base to each position of a triplet. Auxiliary amino 
acids from position +2 can enhance or modulate specificity of amino acids at position -1. 
and these are listed as pairs. Ser or Thr at position +6 permit Asp +2 of the follo\A/inc} 
finger {denoted Asp + + 2) to specify both G and T indirectly, and the pairs are listed, Thr 
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specificity of Ser^3 for T and Thr+3 for C rT)ay be interchangeable in rare instarices. 
while Val+3 appears to be consistently ambiguous. 
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SECTION 3 

Recently we have proposed that specific DNA-binding proteins comprising zinc fingers 
can be made to measure ^* To demonstrate their potential we have created a three 
finger peptide able to bind site-specifically to a unique 9bp region of a BCR-ABL fusion 
oncogene and to discriminate it from the parent genomic sequences ^. Using 
transformed cells in culture as a model, we show that binding to the target oncogene in 
chromosomal DNA is possible, resulting in blockage of transcription. Consequently, 
murine cells made growth factor-independent by the action of the oncogene ^ are found 
to revert to factor dependence on transient transfection with a vector expressing the 
peptide. 
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DNA-binding proteins designed to recognise specific DNA sequences could be 
incoiporated in chimeric transcription factors, necombinases, nucleases etc, for a wide range 
of applications. We have shown that zinc finger mini-domains can discriminate between 
closely related DNA triplets, and have proposed thai they can be linked togedier to form 
domains for the specific recognition of longer DNA sequences ^» 2 Qne interesting 
possibility for the use of such protein domains is to urget selectively genetic differences in 
pathogens or transformed cells. Here we report one such application. 

There exist a set of human leukaemias in which a reciprocal chromosomal 
translocation tC9;22) (q34;qll) results in a truncated chromosome 22, the Philadelphia 
chromosome (Ph^) ^, encoding at the breakpoint a fusion of sequences from the c-ABL proto- 
oncogene ^ and the BCR gene "7. In chronic myelogenous leukaemia (CML), the breakpoints 
usually occur in the first intron of the c-ABL gene and in the breakpoint cluster region of the 
BCR gene ^, and give rise to a p210^<^^->^^^ gene product ^. Alternatively, in acute 
lymphoblastic leukaemia (ALL), the breakpoints usually occur in the first introns of both 
BCR and c-ABL and result in a ^i^j{)BCR-ADL gene product (Fig.4 ) Facsimiles of 
these rearranged genes act as dominant transforming oncogenes in cell culture ^ and 
a-ansgenic mice Like their genomic counterparts, the cDNAs bear a unique nucleotide 
sequence at the fusion point of the BCR and c-ABL genes, which can be recognised at the 
DNA level by a site-specific DNA-binding protein. We have designed such a protein to 
recognise the unique fusion site in the p\9()^CR-ABL cDNA. This fusion is obviously distinct 
trom the breakpoints in the spontaneous genomic translocations, which are thought to be 
variable among patients. Although the design of such peptides has implications for cancer 
research, our primariy aim here is to prove the principle of protein design, and to assess the 
feasibility of in vivo binding to chromosomal DNA in available model systems. 

The DNA-binding proteins we create are composed of classical zinc fingers 
These small motifs are ideal natural building blocks for de novo protein design since they 
function as independent modules but can be connected by a well known linker ^'^ to 
allow recognition of long, asymmetric DNA sequences. Lately it has been possible to isolate 
zinc fingers which bind to given DNA triplets by selection from phage display libraries of 



t t. 

t * 

t 

36 

randomised zinc fingers The specificity of selected fingers is checked by a second 

selection technique called the 'binding site signature' in which these fingers are used to screen 
libraries of randomised oligonucleotide binding sites, thus identifying fingers which can 
specify a unique base triplet ^. From these and other studies elements of a recognition 
code have emerged which relate the amino acid sequence of zinc fingers to their cognate 
triplet 

The strategy we use in creating DNA-binding proteins combines phage display 
selection and rational design based on the available recognition rules. A nine base-pair target 
sequence (GCA,GAA,GCC) for a three zinc finger peptide was chosen which spanned the 
fusion point of the pl90^^^'-^^^ cDNA The three triplets forming this binding site were 
each used to screen a zinc finger phage library over three rounds as described ^. The selected 
fingers were then analysed by binding site signatures to reveal their prefered triplet, and 
mutations to improve specificity were made to the finger selected for binding to OCA 
(Fig. 5), A phage display mini- library of putative 5C^-i45L-binding three-finger proteins was 
cloned in fd phage, comprising six possible combinations of the six selected or designed 
fingers (lA, IB; 2A; 3A, 3B and 3C) linked in the appropriate order. The mini library was 
screened once with an oligonucleotide containing the 9 base-pair BCR-ABL target sequence, 
to select for tight binding clones over weak binders and background vector phage. Because 
the library was small, we did not include competitor DNA sequences for homologous regions 
of the genomic BCR and c-ABL genes, but instead checked the selected clones for their 
ability to discriminate; We found that althoush all the selected clones were able to bind the 
BCR-ABL target sequence and to discriminate between this and the genomic BCR sequence, 
only a subset could discriminate against the c-ABL sequence which, at the junction between 
intron 1 and exon 2, has an 8/9 base-pair homology to the BCR-ABL target sequence 
Sequencing of the discriminating clones revealed two types of selected peptide, one with the 
composition 1A-2A-3B and the other with 1B-2A-3B. Thus both peptides carried the third 
finger (3B) which was specifically designed against the triplet OCA but peptide 1A-2A-3B 
was able to bind to the BCR-ABL target sequence with higher affinity than was peptide IB- 
2A-3B. 
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The peptide 1A-2A-3B, which we will henceforth refer to as the anti-BCR-ABL 
peptide, was used in further experiments. The anti-BCR-ABL peptide has an apparent 
equilibrium dissociation constant (Kd) of 6.2±0.4 x IQ-'^M for the pl90^^^*^^^ cDNA 
sequence in vitro, and discriminates against the similar sequences found in genomic BCR and 
C'ABL DNA, by factors greater than an order of magnitude (Fig. 6). The measured 
dissociation constant is higher than that of three-finger peptides from naturally occuring 
proteins such as Spl or Zif268 which have Y^s in the range of IQ-^M, but rather is 
comparable to that of the two fingers from the tramtrack {ttk) protein However, the 
affinity of the anti-BCR-ABL peptide could be refined, if desired, by site-directed mutations 
or by 'affinity maturation' of a phage display libraiy 

Having established DNA discrimination in vitro, we wished to test whether the anti- 
BCR-ABL peptide was capable of site-specific DNA-binding in vivo. The peptide was fused 
to the VP 16 activation domain from herpes simplex virus and used in transient 
transfection assays (Fig. 7 ) to drive production of a CAT (chloramphenicol acetyl 
u-ansferase) reporter gene from a binding site upsuream of the TATA box A thirty-fold 
increase in CAT activity was observed in cells cotransfected with reporter plasmid bearing 
copies of the pi 90^^^*^^^ cDNA target site, compared to a barely detectable increase in 
cells cotransfected with reporter plasmid bearing copies of either the BCR or c-ASL 
semihomologous sequences. The selective stimulation of transcription indicates convincingly 
that highly site-specific DNA-binding can occur in vivo. However, while transient 
transfections assay binding to plasmid DNA, the ti-ue tai'get site for this and most other DNA- 
binding proteins is in genomic DNA. This might well present significant problems, not least 
since this DNA is physically separated from the cytosol by the nuclear membrane, but also 
since it may be packaged within chromatin. 

To study whether genomic targeting is possible, we made a construct in which our 
anti-BCR-ABL peptide was tlanked at the N-terminus with the nuclear localisation signal 
from the large T antigen of SV4() virus and at the C-terminus with an 11 amino acid c- 
myc epitope tag recognisable by the yElO antibody This construct was used to transiently 
transfect the IL-3-dependent murine cell line Ba/F3 or alternatively Ba/F3+pl90 and 
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Ba/F3H-p210 cell lines previously made IL-3-independent by integrated plasmid constructs 
expressing either pl90^CR'ABL or p2lO^CR-ABL ^ respectively (ISG et al., in preparation) 
Staining of the cells with the 9E10 antibody followed by a secondary fluorescent conjugate 
showed efficient nuclear localisation in those cells transfected with the anti-BC/?-A5L 
peptide (Fig. 8 ). The efficiency of transient transfection, measured as the proportion of 
immunofluorescent cells in the population, was 15-20%. When IL-3 is withdrawn from tissue 
culture, a corresponding proportion of Ba/F3+pl90 cells are found to have reverted to factor 
dependence and die, while Ba/F3+p210 cells are unaffected (Fig. 9a). Immunofluorescence 
microscopy of transfected Ba/F3+pl9() cells in the absence of rL-3 shows chromatin 
condensation and nuclear fragmentation into small apoptotic bodies, while the nuclei of 
Ba/F3+p210 cells remain intact (Fig.s ). Northern blots of total cytoplasmic RNA from 
Ba/F3+pl90 cells transiently transfected with the anti-BCR-ABL peptide revealed reduced 
levels of pl90^^^-^^^ mRNA relative to untransfected cells. By contrast, similarly 
transfected Ba/F3+p210 cells showed no decrease in the levels of p21C)^^^->^^^ mRNA (Fig. 
9 b). 

Hence a DNA-binding protein designed to recognise a specific DNA sequence in 
vitro, is active in vivo where, directed to the nucleus by an appended localisation signal, it 
can bind its target sequence in chromosomal DNA. This is found on otherwise actively 
transcribing DNA, so presumably binding of the peptide blocks the path of the polymerase, 
causing stalling or abortion. The use of a specific polypeptide in this case to target intragenic 
sequences is reminiscent of antisense oligonucleotide- or ribozyme- based approaches to 
inhibiting the expression of selected genes Like antisense oligonucleotides, zinc finger 
DNA-binding proteins can be tailored against genes altered by chromosomal translocations, 
or point mutations, as well as to regulatory sequences within genes. Also, like 
oligonucleotides which can be designed to repress transcription by triple helix formation in 
homopurine-homopyrimidine promoters DNA-binding proteins can bind to various 
unique regions outside genes, but in contrast ihey can direct gene expression by both up- or 
down- regulating the initiation of transcription when fused to activation ^'^ or repression 
domains • * . In any case, by acting directly on any DNA, and by allowing fusion to a variety 
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of protein effectors, tailored site-specific DNA-binding proteins have the potential to control 
gene expression, and indeed to manipulate the genetic material itself, in medicine and 
research. 
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FIGURE LEGENDS FOR SECTION 3 

Fig. 4 . Nucleotide sequences of the fusion point between BCR and ABL sequences in 
pl90 cDNA, and of the corresponding exon boundaries in the BCR and c-ABL genes. 
Exon sequences are written in capital letters while introns are given in lowercase. Linel, 
pigQBCR'ABL cDNA; line 2, BCR genomic sequence at junction of exon 1 and intron 1; line 
3, ABL genomic sequence at junction of inu'on 1 and exon 2 The 9bp target sequence in 
the pl90BCR-ABL cDNA is underlined, as are the homologous sequences in genomic BCR and 
c-ABL. 

Fig. 5. Amino acid sequences of zinc fingers used in constructing the mini library of 
putative BCR-ABL binders. Regions of secondary structure are underlined below the list, 
while residue positions are given above, relative to the first position of the a-helix (position 
1). Zinc finger phages were selected from a library of 2.6x10^ variants, using three DNA 
binding sites each containing one of the triplets GCC, GAA or GCA 1. Binding site 
signatures (data not shown) indicate that fingers lA and IB specify the triplet GCC, finger 
2A specifies GAA, while the fingers selected using the triplet GCA all prefer binding to GOT 
. Amongst the latter is finger 3A, the specificity of which we believed, on the basis of 
recognition rules, could be changed by a point mutation. Finger 3B, based on the selected 
finger 3A, but in which Gin at helical position +2 was altered to Ala should be specific for 
GCA. Finger 3C is an alternative version of finger 3A, in which the recognition of C is 
mediated by Asp+3 rather than by Thr+3. 

Fig.g . Discrimination in the binding of the anti-BCR-ABL peptide to its pl90^^^-^^^ 
target site and to like regions of genomic BCR and c-ABL. The graph shows binding 
(measured as an A45O-650) at various [DNA]. Binding reactions and complex detection by 
enzyme immunoassay were perfonncd as described 2, and a full curve analysis was used in 
calculations of the Kd ^'7. The DNAs used were oligonucleotides spanning 9bp either side of 
the fusion point in the cDNA or the exon boundaries. The anti-BCR-ABL peptide binds to its 
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intended target site with a Kd= 6.2±0.4 xlO'^M, and is able to discriminate against genomic 
BCR and c-ABL sequences, though the latter differs by only one base pair in the bound 9bp 
region. 

Fig. 7. Transact! vation reporter assays using an anti-BCR-ABL peptide fused to the 
VP16 activation domain. C3H10T^/2 ceils were transiently cotransfected with a CAT 
reporter plasmid and an anti-BCR-ABL/VPI6 expression vector (pZNlA). The top panel of 
the figure shows the results of thin layer chromatography of samples from different 
transfections, in which the fold induction of CAT activity relative to a sample where reporter 
alone was transfected (panel 1) is plotted on a histogram below. A specific (thirty-fold) 
increase in CAT activity was observed in cells cotransfected with reporter plasmid bearing 
copies of the pl90^CR-ABL cDNA target site, indicating in vivo binding. The particular 
constructs used in different transfections are noted below the histogram. 
METHODS: Reporter plasmids pMCAT6BA, pMCAT6A, and pMCAT6B, were constructed 
by inserting 6 copies of the pl9()BCR-AnL target site (CGCAGAAGCC), the c-ABL second 
exon-intron junction sequence (TCCAGAAGCC), or the BCR first exon-intron junction 
sequence (CGCAGGTGAG) respectively, into pMCAT3 The anti-BCR-ABLA^P16 
expression vector was generated by inserting the in- frame fusion between the activation 
domain of heipes simplex virus VPI6 and the Zn finger peptide in the pEF-BOS vector 
C3H10T1/2 cells were transiently co-transfected with 10 fig of reporter plasmid and 10 
l^g of expression vector. RSVL which contains the Rous sarcoma virus long terminal 
repeat linked to luciferase, was used as an internal control to normalise for differences in 
transfection efficiency. Cells were transfected by the calcium phosphate precipitation method 
and CAT assays performed as described ^'^ . Plasmid pG5EC, which has five consensus 17- 
mer GAL4-binding sites upstream from the minimal promoter of the adenovirus Elb TATA 
box, and pMlVP16 vector, which encodes an in-frame fusion between the DNA-binding 
domain of GAL4 and the activation domain of herpes simplex virus VP16, were used as a 
positive control 
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Fig. 8- Immunofluorescence of Ba/F3+pl90 and Ba/F3+p210 cells transiently 
transfected with the anti-bcr-abl expression vector and stained with the 9E10 antibody. 
The image shows expression and nuclear localisation of the anti-BCR-AJBL peptide (panels 
B, C, and D). In addition, transfected Ba/F3-i-pl90 cells show chromatin condensation and 
nuclear fragmentation into small apoptotic bodies (panels B, and C), but not either 
untransfected Ba/F3+pl90 cells (panel A) or transfected Ba/F3-fp210 cells (panel D). 
METHODS: The anti-iBC/? -A5L expression vector was generated in the pEF-BOS vector 
including an 11 amino acid c-myc epitope tag (EQKLISEEDLN) at the carboxy-terminal 
end, recognizable by the 9E10 antibody and the nuclear localization signal PKKKRKV of 
the large T antigen of SV4() vii-us at the arriino-terminal end. Three glycine residues were 
introduced downstream of the nuclear localization signal as a spacer, to ensure exposure of 
die nuclear leader from the folded molecule. Ba/F3 cells were transfected with 25 jig of the 

and-BCR-ABL expression construct tagged with the 9E10 c-myc epitope as described ■^^ and 
protein production analyzed 48 h later by immunofluorescence-Iabeling as follows. Cells 
were fixed in 4% (w/v) paraformaldehyde for 15 min, washed in phosphate-buffered saline 
(PBS), and peiTneabilized in methanol for 2 min. After blocking in 10% fetal calf serum in 
PBS for 30 min, the mouse 9E10 antibody was added. After a 30 min incubation at room 
temperature a fluorescein isothiocyanate (FITC)-conjugated goat anti-mouse IgG (SIGMA) 
was added and incubated for a further 30 min. Fluorescent cells were visualized using a 
confocal scanning microscope (magnification, 200X). 

Fig, 9 a. Viability in the absence of lL-3 of transformed Ba/F3 cells transiently 
transfected with a vector expressing anti-5C/?-A5L peptide. The Ba/F3 cell line is 
dependent on IL-3 for growth, but becomes IL-3 independent when stably transformed by 
plc^QBCR-ABL or p2l0^^^-^^^ cDNA (4, and ISO et al., in preparation). A proportion of 
Ba/F3+pl90 cells transfected with the anti-BCR-ABL expression vector reveit to IL-3 
dependence, while similarly transfected Ba/F3-+-p210 ceils are unaffected. 
METHODS: Cell lines Ba/F3, Ba/F3+piy{) and Ba/F3+p210 were maintained in Dulbecco's 
modified Eagle's medium (DMEM) supplemented with 10% fetal bovine serum. In the case 



* 

43 

of Ba/F3 cell line 10% WEHI-3B-condiuoned medium was included as a source of IL-3. 
After the transfection with the anti-BCR-ABL expression vector, cells (SxlO^/ml) were 
washed twice in serum-free medium and cultured in DMEM medium with 10% fetal bovine 
serum without WEHI-3B-conditioned medium. Percentage viability was determined by 
tripan blue exclusion. Data are expressed as means of triplicate cultures. 

Fig. 9b. Northern filter hybridisation analysis of Ba/F3-fpl90 and Ba/F3+p210 cell lines 
transfected with the anti-BCR-ABL expression vector. Lane 1 is from untransfected 
Ba/F3+pl90 cell line; lanes 2, and 3 are from Ba/F3+pl90 cell line transfected with the anti- 
BCR-ABL expression vector; lane 4 is from untransfected Ba/F3+p210 cell line; lanes 5, and 
6 are from Ba/F3-fp210 cell line transfected with the anti-BCR-ABL expression vector. 
When transfected with the anti-BCR-ABL expression vector, a specific downregulation of 
pigQBCR-ABL nriRNA is seen in Ba/F3-hpl9{) cells, while expression of p210^^^-^^^ is 
unaffected in Ba/F3+p210 cells. 

METHODS: 10 |j,g of total cytoplasmic RNA, from the cells indicated, was glyoxylated and 
fractioned in 1.4% agarose gels in lOmM NaP04 buffer, pH 7.0. After electrophoresis the gel 
was blotted onto Hybond-N (Amersham), UV-cross linked and hybridized to an 32p.iiibelled 
C-A5Z- probe. Autorradiography was for 14h at -70 ^C. Loading was monitored by reprobing 
the filters with a mouse P-actin cDNA. 
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